DART: Dropouts meet Multiple Additive Regression Trees

نویسندگان

  • K. V. Rashmi
  • Ran Gilad-Bachrach
چکیده

MART (Friedman, 2001, 2002), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks (Hinton et al., 2012). We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent. Appearing in Proceedings of the 18 International Conference on Artificial Intelligence and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38. Copyright 2015 by the authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling MOOC Dropouts

In this project, we model MOOC dropouts using user activity data. We have several rounds of feature engineering and generate features like activity counts, percentage of visited course objects, and session counts to model this problem. We apply logistic regression, support vector machine, gradient boosting decision trees, AdaBoost, and random forest to this classification problem. Our best mode...

متن کامل

PSMART: Parameter Server based Multiple Additive Regression Trees System

In this paper, we describe a Parameter Server based Multiple Additive Regression Trees system, or PSMART for short. Empirically, PSMART scales MART to hundreds of billions of samples and thousands of features in production.

متن کامل

Enhanced Prediction of Student Dropouts Using Fuzzy Inference System and Logistic Regression

Predicting college and school dropouts is a major problem in educational system and has complicated challenge due to data imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout student...

متن کامل

Predictive factors of glycosylated hemoglobin using additive regression model

Introduction: Diabetes is a chronic disease, non-epidemic disease that costs a lot of money in each year. One of the diagnostic criteria for diabetes is Glycosylated Hemoglobin (HBA1C), which in this study the effective factors on it examined by additive regression model. Materials and Methods: In this cross-sectional study, 130 patients with diabetes type-2 were selected based on simple random...

متن کامل

Additive Groves of Regression Trees

We present a new regression algorithm called Additive Groves and show empirically that it is superior in performance to a number of other established regression methods. A single Grove is an additive model containing a small number of large trees. Trees added to a Grove are trained on the residual error of other trees already in the model. We begin the training process with a single small tree ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.01866  شماره 

صفحات  -

تاریخ انتشار 2015